Table of Contents

  • 1  Overview
  • 2  Environment setting and packages
  • 3  Data summary
  • 4  Preprocess scRNAseq data using Cell Ranger
    • 4.1  Fastq files and BCL naming convention
    • 4.2  Download reference genome
    • 4.3  Setup the command for cellranger count
  • 5  Quality Control and Analysis using Seurat
    • 5.1  QC and clustering for each set (control as an example)
    • 5.2  Standard pre-process workflow
  • 6  Normalizing the data
    • 6.1  Identification of highly variable features
    • 6.2  Cell cycle correction
    • 6.3  Scaling the data
  • 7  Clustering
    • 7.1  Perform linear dimensional reduction and visualiztion
    • 7.2  Determine the 'dimensionality' of the dataset
    • 7.3  Cluster the cells
    • 7.4  Run non-linear dimensional reduction (UMAP/tSNE)
  • 8  DE analysis (using both Seurat and edgeR)
    • 8.1  Seurat-- Find differentially expressed features (cluster biomarkers)
    • 8.2  DE using edgeR
  • 9  Integration analysis
    • 9.1  Setup Seurat object for 3 datasets
    • 9.2  Dataset preprocessing
    • 9.3  Perform integration
    • 9.4  Perform an integrated analysis
    • 9.5  Identify conserved cell type markers
    • 9.6  Identify differentially expressed genes between conditions

Overview¶

This Jupyter Notebook aims to provide a workflow for processing single-cell RNAseq data. Single-cell sequencing analysis is a relatively new area with many ongoing changes happening. Multiple tools and methods have been developed to deal with single-cell data, and there are many interesting topics for scRNAseq analysis. In this Notebook, we will mainly focus on the following selected topics:

  1. Pre-processing scRNA-seq FASTQ files using Cell Ranger and QC
  2. Clustering
  3. Differential Expression (DE) analysis
  4. Integration analysis of multiple single-cell datasets

Environment setting and packages¶

We are using the SoS kernel in this notebook, so we will use a mixture of R and bash commands and will invoke them with %use. I will set the working directory in both R and bash again:

In [1]:
%use r
mydir<- getwd()
setwd(mydir)
R_zmq_bind errno: 98 strerror: Address already in use
Warning message:
In zmq.bind(sockets$shell, url_with_port("shell_port")) :
  zmq.bind fails, tcp://127.0.0.1:59701
In [2]:
%use bash
mydir=`pwd`
cd $mydir

Before we can run any analysis, we need to load the necessary R packages. The list of packages are loaded with library() function.

The following code might generate multiple message when loading the packages,such as "The following objects are masked from XXX", which is normal.You can ignore them.

In [1]:
%use r
library(DESeq2)
library(dplyr)
library(edgeR)
library(Seurat)
library(cowplot)
library(MetaDE)
library(patchwork)
Loading required package: S4Vectors

Loading required package: stats4

Loading required package: BiocGenerics


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colnames, dirname, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
    pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
    tapply, union, unique, unsplit, which.max, which.min



Attaching package: ‘S4Vectors’


The following objects are masked from ‘package:base’:

    I, expand.grid, unname


Loading required package: IRanges

Loading required package: GenomicRanges

Loading required package: GenomeInfoDb

Loading required package: SummarizedExperiment

Loading required package: MatrixGenerics

Loading required package: matrixStats


Attaching package: ‘MatrixGenerics’


The following objects are masked from ‘package:matrixStats’:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars


Loading required package: Biobase

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.



Attaching package: ‘Biobase’


The following object is masked from ‘package:MatrixGenerics’:

    rowMedians


The following objects are masked from ‘package:matrixStats’:

    anyMissing, rowMedians



Attaching package: ‘dplyr’


The following object is masked from ‘package:Biobase’:

    combine


The following object is masked from ‘package:matrixStats’:

    count


The following objects are masked from ‘package:GenomicRanges’:

    intersect, setdiff, union


The following object is masked from ‘package:GenomeInfoDb’:

    intersect


The following objects are masked from ‘package:IRanges’:

    collapse, desc, intersect, setdiff, slice, union


The following objects are masked from ‘package:S4Vectors’:

    first, intersect, rename, setdiff, setequal, union


The following objects are masked from ‘package:BiocGenerics’:

    combine, intersect, setdiff, union


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: limma


Attaching package: ‘limma’


The following object is masked from ‘package:DESeq2’:

    plotMA


The following object is masked from ‘package:BiocGenerics’:

    plotMA


The legacy packages maptools, rgdal, and rgeos, underpinning this package
will retire shortly. Please refer to R-spatial evolution reports on
https://r-spatial.org/r/2023/05/15/evolution4.html for details.
This package is now running under evolution status 0 

Attaching SeuratObject


Attaching package: ‘Seurat’


The following object is masked from ‘package:SummarizedExperiment’:

    Assays


Loading required package: survival

Loading required package: impute

Loading required package: combinat


Attaching package: ‘combinat’


The following object is masked from ‘package:utils’:

    combn


Loading required package: tools


Attaching package: ‘patchwork’


The following object is masked from ‘package:cowplot’:

    align_plots


Data summary¶

The data used for this notebook is single-cell RNA-seq data from Human prostate carcinoma-associated fibroblasts. Carcinoma-associated fibroblasts (CAF) are a heterogeneous group of cells within the tumor microenvironment (TME) that can promote tumorigenesis in the prostate. Please refer to this paper if you would like to know details of this data.

Preprocess scRNAseq data using Cell Ranger¶

Cell Ranger was developed and maintained by 10x Genomics. it provides a set of pipelines to process and analzye raw scRNA-seq data. Below, we provide some example codes for using Cell Ranger and more details can be found in 10x Genomics website here.

If you would like to try them yourselves, you will need to download the data and modify the output paths accordingly to your own directory.

Fastq files and BCL naming convention¶

The FASTQ data files are already prepared and saved in "/anvil/projects/x-tra220018/2022/datasets/single_cellData/Ratliff_CAF/". You don't have access to make changes in this directory. But you can try codes by changing the output path to your directory. Below is the directory structure:

030386_Control-CAF_S1_run656_L001_R1_001.fastq.gz
030386_Control-CAF_S1_run656_L001_R2_001.fastq.gz
030386_Control-CAF_S1_run656_L002_R1_001.fastq.gz
030386_Control-CAF_S1_run656_L002_R2_001.fastq.gz
030386_Control-CAF_S1_run656_L003_R1_001.fastq.gz
030386_Control-CAF_S1_run656_L003_R2_001.fastq.gz
030386_Control-CAF_S1_run656_L004_R1_001.fastq.gz
030386_Control-CAF_S1_run656_L004_R2_001.fastq.gz
030386_Control-CAF_S1_run659_L001_R1_001.fastq.gz
030386_Control-CAF_S1_run659_L001_R2_001.fastq.gz
030386_Control-CAF_S1_run659_L002_R1_001.fastq.gz
030386_Control-CAF_S1_run659_L002_R2_001.fastq.gz
030386_Control-CAF_S1_run659_L003_R1_001.fastq.gz
030386_Control-CAF_S1_run659_L003_R2_001.fastq.gz
030386_Control-CAF_S1_run659_L004_R1_001.fastq.gz
030386_Control-CAF_S1_run659_L004_R2_001.fastq.gz

Please notice that cell ranger requires the input FASTQ files to have a special naming convention of bcl2fastq or mkfastq: eg. Sample_S1_L00X_R1_001.fastq.gz. Briefly, FASTQ files taken by cellranger count are named with the sample name and number, the flow cell lane, and read. The file extension is '*.fastq.gz'. An example of FASTQ file name looks like this: samplename_S1_L001_R1_001.fastq.gz.

If the downloaded *.fastq.gz files are not in this naming convention, you will need to manually rename all files before you can call `cellranger count`.

Here is the explaination for each element in the name:

  • samplename: The name of the sample provided in the sample sheet. If a sample name is not available, the file name uses the sample ID instead.
  • S1: The number of the sample based on the order that samples are listed in the sample sheet, starting with 1.
  • L001: The lane number of the flow cell, starting with lane 1, to the number of lanes supported.
  • R1: The read index. R1 indicates Read 1. R2 indicates Read 2 of a paired-end run.
  • 001: The last portion of the file name is always 001.

Special Note: L001 and L002 are indices of different Illumina sequencing lanes or batches, and we can use these indices as well as sample indices as means by which to distinguish treatment groups. If we want to analyze all samples in one treatment group together, they will be assigned the same sample number (e.g. S1) and different lane number (e.g. L001 and L002). (NOTE: Reads cannot be assigned as sample number 0 or lane number 0. If it has number 0, it will be excluded from downstream analysis.) For example, if there are 2 treatment groups each has 3 replicates, we will index all three replicates in group 1 as S1_L001, S1_L002, S1_L003 and replicates in group 2 as S2_L001, S2_L002 and S3_L003.

Please refer to Illumina or bcl2fastq User Guide for more details.

Download reference genome¶

This code downloads the reference genome from 10x Genomics website. We used the most recent release GRCh38.

In [4]:
%use bash
# ref_path=/anvil/projects/x-tra220018/2022/ref_files/cellranger
# wget https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz -P $ref_path
# tar -zxvf $ref_path/refdata-gex-GRCh38-2020-A.tar.gz -C $ref_path

Setup the command for cellranger count¶

We will use cellranger count command to generate single cell feature counts data from FASTQ files. The following codes use the FASTQ files listed above to map the RNAseq gene reads to the downloaded reference genome GRCh38. Below, we show an example of a script for cellranger count that will be submitted to bash shell on server. The codes before cellranger count aim to configure the bash script's running environment.

The cellranger need to be submitted to run on a server's backend. We already in the backend once we typed startnode. So we don't need to worry about the setup. But in case you will run this program in your instution server, please read the following part.

For submitting cell ranger jobs to the backend of a server, you will need to setup the running environment in your job script before running. Many supercomputers use either SLURM or PBS submission systems. Anvil and other Purdue-based systems use the SLURM submission system. You do not need to run or use the below codes, these headers are simply shown as examples. For some of you who want to use your own institutions’ supercomputers after the workshop, you can use these headers as a reference to make your own job submission scripts.

An example of job submission scripts is shown below. It starts with !/bin/sh -l, then specify the necessary job parameters. You can refer to this page for SLURM job submission script, and this page for creating a PBS bash script. You will need to modify these parameters according to the computing environment in your institution server.

In [5]:
# SLURM job submission script

#!/bin/sh -l
#SBATCH -p standard
#SBATCH -N 1
#SBATCH -n 40
#SBATCH --time=4:00:00
In [6]:
# PBS job submission script

#!/bin/sh
#PBS -q long
#PBS -l nodes=1:ppn=10
#PBS -l walltime=4:00:00
#PBS -M XXX@purdue.edu
# cd $PBS_O_WORKDIR

We are already on the backend, so we can run the cellranger directly. We will load the pre-installed software using the module load command. (If you run it on other supercomputers, you will need to install cellranger first.) The data manipulation step is completed with the argument beginning with cellranger count.

The cellranger arguments are broken up into multiple lines for easy reading.

Important arguements we used for cellranger count are listed below.

  • --id: is a unique ID that you used to name the output files.
  • --sample: specifies the name of sample. It needs to match the sample name listed in the FASTQ files
  • --transcriptome: specifies the path to reference genome.
  • --fastqs: specifies the path to the folder containing FASTQ files.
  • --localcores: specifies the number of cores to use. By default, cell ranger will use all cores available on server
  • --expect-cells: specifies the expected number of recovered cells. By default, this number is set to be 3000

Cellranger doesn't support specifying the output directory in the code. We will have to set the output directory and cd to the output folder before calling cellranger. Then cd back to the current notebook path once it is done. (The unset MPLBACKEND is for unsetting an environment variable in Jupyter Notebook that confuses cellranger.)

The codes below will take about 30 minutes to finish. If you saw a message "Pipeline failed", it likely you didn't request enough cores for completing the jobs. Please make sure you requested at least 30 cores and rerun these codes.
In [51]:
%use bash
mydir=`pwd`

unset MPLBACKEND
MRO_DISK_SPACE_CHECK=disable
module load biocontainers
module load cellranger/6.1.1

ref_path=/anvil/projects/x-tra220018/ref_files/cellranger/
data_path=/anvil/projects/x-tra220018/current/datasets/single_cellData/example
out_path=./data/cellranger
cd $out_path

cellranger count --id=Control-CAF \
                --sample=030386_Control-CAF \
                --transcriptome=$ref_path/refdata-gex-GRCh38-2020-A/ \
                --fastqs=$data_path \
                --expect-cells=5000   \

cd $mydir
User guides for each biocontainer module can be found in
https://www.rcac.purdue.edu/knowledge/biocontainers 
Martian Runtime - v4.0.6
2025-03-05 01:01:29 [jobmngr] WARNING: configured to use 35GB of local memory, but only 24.0GB is currently available.
Serving UI at http://a204.anvil.rcac.purdue.edu:42285?auth=0U_95By4wTHSoeUKuQAbhVf-kTe9UvHqpKFLw7hMBoM

Running preflight checks (please wait)...
Checking sample info...
Checking FASTQ folder...
Checking reference...
Checking reference_path (/anvil/projects/x-tra220018/ref_files/cellranger/refdata-gex-GRCh38-2020-A) on a204.anvil.rcac.purdue.edu...
Checking optional arguments...
mrc: v4.0.6

mrp: v4.0.6

Anaconda: Python 3.8.2

numpy: 1.19.2

scipy: 1.6.2

pysam: 0.16.0.1

h5py: 3.2.1

pandas: 1.2.4

STAR: 2.7.2a

samtools: samtools 1.10
Using htslib 1.10.2
Copyright (C) 2019 Genome Research Ltd.

2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.SANITIZE_MAP_CALLS
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.DISABLE_BAMS
2025-03-05 01:01:34 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.DISABLE_BAMS.fork0.chnk0.main
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.WRITE_GENE_INDEX
2025-03-05 01:01:34 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.WRITE_GENE_INDEX.fork0.chnk0.main
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.FULL_COUNT_INPUTS.WRITE_GENE_INDEX
2025-03-05 01:01:34 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.FULL_COUNT_INPUTS.WRITE_GENE_INDEX.fork0.chnk0.main
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MAKE_FULL_CONFIG._MAKE_VDJ_CONFIG
2025-03-05 01:01:34 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MAKE_FULL_CONFIG._MAKE_VDJ_CONFIG.fork0.chnk0.main
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_CHEMISTRY_DETECTOR._GEM_WELL_CHEMISTRY_DETECTOR.DETECT_COUNT_CHEMISTRY
2025-03-05 01:01:34 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_CHEMISTRY_DETECTOR._GEM_WELL_CHEMISTRY_DETECTOR.DETECT_COUNT_CHEMISTRY.fork0.chnk0.main
2025-03-05 01:01:34 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.DISABLE_BAMS
2025-03-05 01:01:34 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MAKE_FULL_CONFIG._MAKE_VDJ_CONFIG
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_T_GEM_WELL_PROCESSOR.MULTI_SETUP_CHUNKS
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_B_GEM_WELL_PROCESSOR.MULTI_SETUP_CHUNKS
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_B_GEM_WELL_PROCESSOR.SC_VDJ_CONTIG_ASSEMBLER.MAKE_SHARD
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_T_GEM_WELL_PROCESSOR.SC_VDJ_CONTIG_ASSEMBLER.MAKE_SHARD
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_B_GEM_WELL_PROCESSOR.SC_VDJ_CONTIG_ASSEMBLER.BARCODE_CORRECTION
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_T_GEM_WELL_PROCESSOR.SC_VDJ_CONTIG_ASSEMBLER.BARCODE_CORRECTION
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_T_GEM_WELL_PROCESSOR.SC_VDJ_CONTIG_ASSEMBLER.RUST_BRIDGE
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_B_GEM_WELL_PROCESSOR.SC_VDJ_CONTIG_ASSEMBLER.RUST_BRIDGE
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_T_GEM_WELL_PROCESSOR.SC_VDJ_CONTIG_ASSEMBLER.ASSEMBLE_VDJ
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_T_GEM_WELL_PROCESSOR.SC_VDJ_CONTIG_ASSEMBLER.MERGE_METRICS
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_B_GEM_WELL_PROCESSOR.SC_VDJ_CONTIG_ASSEMBLER.ASSEMBLE_VDJ
2025-03-05 01:01:34 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_B_GEM_WELL_PROCESSOR.SC_VDJ_CONTIG_ASSEMBLER.MERGE_METRICS
2025-03-05 01:01:59 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.FULL_COUNT_INPUTS.WRITE_GENE_INDEX
2025-03-05 01:01:59 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR.PARSE_TARGET_FEATURES
2025-03-05 01:01:59 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR.PARSE_TARGET_FEATURES.fork0.chnk0.main
2025-03-05 01:01:59 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.WRITE_GENE_INDEX
2025-03-05 01:02:01 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR.PARSE_TARGET_FEATURES
2025-03-05 01:02:20 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_CHEMISTRY_DETECTOR._GEM_WELL_CHEMISTRY_DETECTOR.DETECT_COUNT_CHEMISTRY
2025-03-05 01:02:20 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_CHEMISTRY_DETECTOR._GEM_WELL_CHEMISTRY_DETECTOR.CHECK_BARCODES_COMPATIBILITY
2025-03-05 01:02:20 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_CHEMISTRY_DETECTOR._GEM_WELL_CHEMISTRY_DETECTOR.CHECK_BARCODES_COMPATIBILITY.fork0.chnk0.main
2025-03-05 01:02:20 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_CHEMISTRY_DETECTOR._GEM_WELL_CHEMISTRY_DETECTOR.CHECK_BARCODES_COMPATIBILITY
2025-03-05 01:02:20 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_CHEMISTRY_DETECTOR.COMBINE_GEM_WELL_CHEMISTRIES
2025-03-05 01:02:20 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_CHEMISTRY_DETECTOR.COMBINE_GEM_WELL_CHEMISTRIES.fork0.chnk0.main
2025-03-05 01:02:20 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_CHEMISTRY_DETECTOR.COMBINE_GEM_WELL_CHEMISTRIES
2025-03-05 01:02:20 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR.MULTI_SETUP_CHUNKS
2025-03-05 01:02:20 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR.MULTI_SETUP_CHUNKS.fork0.chnk0.main
2025-03-05 01:02:20 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.SPLIT_VDJ_INPUTS
2025-03-05 01:02:21 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR.MULTI_SETUP_CHUNKS
2025-03-05 01:02:21 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.MAKE_SHARD
2025-03-05 01:02:21 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.MAKE_SHARD.fork0.split
2025-03-05 01:02:27 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.MAKE_SHARD
2025-03-05 01:02:27 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.MAKE_SHARD.fork0.chnk0.main
2025-03-05 01:08:06 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.MAKE_SHARD
2025-03-05 01:08:06 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.MAKE_SHARD.fork0.join
2025-03-05 01:08:06 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.MAKE_SHARD
2025-03-05 01:08:06 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.BARCODE_CORRECTION
2025-03-05 01:08:06 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.BARCODE_CORRECTION.fork0.split
2025-03-05 01:08:06 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.BARCODE_CORRECTION
2025-03-05 01:08:06 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.BARCODE_CORRECTION.fork0.chnk0.main
2025-03-05 01:08:06 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.BARCODE_CORRECTION.fork0.chnk1.main
2025-03-05 01:08:06 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.BARCODE_CORRECTION.fork0.chnk2.main
2025-03-05 01:08:13 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.BARCODE_CORRECTION
2025-03-05 01:08:13 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.BARCODE_CORRECTION.fork0.join
2025-03-05 01:08:13 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.BARCODE_CORRECTION
2025-03-05 01:08:13 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.WRITE_BARCODE_INDEX
2025-03-05 01:08:13 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.WRITE_BARCODE_INDEX.fork0.chnk0.main
2025-03-05 01:08:13 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER._SLFE_PARTIAL_FIRST_PASS.SUBSAMPLE_BARCODES
2025-03-05 01:08:13 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.SET_ALIGNER_SUBSAMPLE_RATE
2025-03-05 01:08:13 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.SET_ALIGNER_SUBSAMPLE_RATE.fork0.chnk0.main
2025-03-05 01:08:13 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER._SLFE_PARTIAL_FIRST_PASS.INITIAL_ALIGN_AND_COUNT
2025-03-05 01:08:13 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.SET_ALIGNER_SUBSAMPLE_RATE
2025-03-05 01:08:13 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER._SLFE_PARTIAL_FIRST_PASS.SET_TARGETED_UMI_FILTER
2025-03-05 01:08:13 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.ALIGN_AND_COUNT
2025-03-05 01:08:13 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.ALIGN_AND_COUNT.fork0.split
2025-03-05 01:08:13 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.ALIGN_AND_COUNT
2025-03-05 01:08:13 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.ALIGN_AND_COUNT.fork0.chnk0.main
2025-03-05 01:08:13 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.ALIGN_AND_COUNT.fork0.chnk1.main
2025-03-05 01:08:13 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.ALIGN_AND_COUNT.fork0.chnk2.main
2025-03-05 01:08:13 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.WRITE_BARCODE_INDEX
2025-03-05 01:14:15 [runtime] (update)          ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.ALIGN_AND_COUNT.fork0 chunks running (0/3 completed)
2025-03-05 01:20:29 [runtime] (update)          ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.ALIGN_AND_COUNT.fork0 chunks running (2/3 completed)
2025-03-05 01:24:37 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.ALIGN_AND_COUNT
2025-03-05 01:24:37 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.ALIGN_AND_COUNT.fork0.join
2025-03-05 01:24:37 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.ALIGN_AND_COUNT
2025-03-05 01:24:37 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.WRITE_POS_BAM
2025-03-05 01:24:37 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.WRITE_POS_BAM.fork0.split
2025-03-05 01:24:37 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.COLLATE_METRICS
2025-03-05 01:24:37 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.COLLATE_METRICS.fork0.split
2025-03-05 01:24:37 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.WRITE_H5_MATRIX
2025-03-05 01:24:37 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.WRITE_H5_MATRIX.fork0.chnk0.main
2025-03-05 01:24:37 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.WRITE_MATRIX_MARKET
2025-03-05 01:24:37 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.WRITE_MATRIX_MARKET.fork0.chnk0.main
2025-03-05 01:24:37 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.WRITE_BARCODE_SUMMARY
2025-03-05 01:24:37 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.WRITE_BARCODE_SUMMARY.fork0.chnk0.main
2025-03-05 01:24:37 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.COLLATE_METRICS
2025-03-05 01:24:37 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.COLLATE_METRICS.fork0.chnk0.main
2025-03-05 01:24:38 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.WRITE_POS_BAM
2025-03-05 01:24:38 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.WRITE_POS_BAM.fork0.chnk0.main
2025-03-05 01:24:38 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.WRITE_POS_BAM.fork0.chnk1.main
2025-03-05 01:24:38 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.WRITE_POS_BAM.fork0.chnk2.main
2025-03-05 01:24:41 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.COLLATE_METRICS
2025-03-05 01:24:41 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.COLLATE_METRICS.fork0.join
2025-03-05 01:24:41 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.COLLATE_METRICS
2025-03-05 01:24:41 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.MERGE_METRICS
2025-03-05 01:24:41 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.MERGE_METRICS.fork0.chnk0.main
2025-03-05 01:24:41 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.MERGE_METRICS
2025-03-05 01:24:47 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.WRITE_H5_MATRIX
2025-03-05 01:24:47 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.FILTER_BARCODES
2025-03-05 01:24:47 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.FILTER_BARCODES.fork0.split
2025-03-05 01:24:50 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.WRITE_MATRIX_MARKET
2025-03-05 01:24:53 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.FILTER_BARCODES
2025-03-05 01:24:53 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.FILTER_BARCODES.fork0.join
2025-03-05 01:25:45 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._MATRIX_COMPUTER.WRITE_BARCODE_SUMMARY
2025-03-05 01:26:28 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.FILTER_BARCODES
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_T_GEM_WELL_PROCESSOR.SC_VDJ_CONTIG_ASSEMBLER.HANDLE_GEX_CELLS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.VDJ_B_GEM_WELL_PROCESSOR.SC_VDJ_CONTIG_ASSEMBLER.HANDLE_GEX_CELLS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_B_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.RUN_ENCLONE
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_T_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.RUN_ENCLONE
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.WRITE_MOLECULE_INFO
2025-03-05 01:26:28 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.WRITE_MOLECULE_INFO.fork0.chnk0.main
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.INFER_GEM_WELL_THROUGHPUT
2025-03-05 01:26:28 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.INFER_GEM_WELL_THROUGHPUT.fork0.split
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.VDJ_T_REPORTER.VLOUPE_PREPROCESS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_B_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.FILL_CLONOTYPE_INFO
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_B_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.WRITE_CONSENSUS_TXT
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_T_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.WRITE_CLONOTYPE_OUTS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_T_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.FILL_CLONOTYPE_INFO
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_B_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.WRITE_CLONOTYPE_OUTS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_T_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.WRITE_CONSENSUS_TXT
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.VDJ_B_REPORTER.VLOUPE_PREPROCESS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_B_CLONOTYPE_ASSIGNER.HANDLE_NO_VDJ_REF
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_B_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.WRITE_CONSENSUS_BAM
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COPY_VDJ_REFERENCE
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_T_CLONOTYPE_ASSIGNER.HANDLE_NO_VDJ_REF
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_T_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.WRITE_ANN_CSV
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_B_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.WRITE_ANN_CSV
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_T_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.WRITE_CONSENSUS_BAM
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_B_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.WRITE_CONCAT_REF_OUTS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_T_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.WRITE_CONCAT_REF_OUTS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.VDJ_T_REPORTER.WRITE_CONTIG_OUTS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.VDJ_B_REPORTER.WRITE_CONTIG_OUTS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_T_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.CREATE_AIRR_TSV
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.VDJ_T_REPORTER.REPORT_CONTIGS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.VDJ_B_CLONOTYPE_ASSIGNER.CLONOTYPE_ASSIGNER.CREATE_AIRR_TSV
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.VDJ_B_REPORTER.REPORT_CONTIGS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.VDJ_T_REPORTER.SUMMARIZE_VDJ_REPORTS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.VDJ_B_REPORTER.SUMMARIZE_VDJ_REPORTS
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.VDJ_T_REPORTER.WRITE_CONTIG_PROTO
2025-03-05 01:26:28 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.VDJ_B_REPORTER.WRITE_CONTIG_PROTO
2025-03-05 01:26:32 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.INFER_GEM_WELL_THROUGHPUT
2025-03-05 01:26:32 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.INFER_GEM_WELL_THROUGHPUT.fork0.join
2025-03-05 01:26:36 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.INFER_GEM_WELL_THROUGHPUT
2025-03-05 01:26:36 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.CALL_TAGS_MARGINAL
2025-03-05 01:26:36 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.CALL_TAGS_MARGINAL.fork0.split
2025-03-05 01:26:39 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.CALL_TAGS_MARGINAL
2025-03-05 01:26:39 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.CALL_TAGS_MARGINAL.fork0.join
2025-03-05 01:27:32 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.CALL_TAGS_MARGINAL
2025-03-05 01:27:39 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.WRITE_MOLECULE_INFO
2025-03-05 01:27:39 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.CALL_TAGS_JIBES
2025-03-05 01:27:39 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.CALL_TAGS_JIBES.fork0.split
2025-03-05 01:27:39 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUBSAMPLE_READS
2025-03-05 01:27:39 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUBSAMPLE_READS.fork0.split
2025-03-05 01:27:42 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUBSAMPLE_READS
2025-03-05 01:27:42 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUBSAMPLE_READS.fork0.chnk0.main
2025-03-05 01:27:42 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUBSAMPLE_READS.fork0.chnk1.main
2025-03-05 01:27:42 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUBSAMPLE_READS.fork0.chnk2.main
2025-03-05 01:27:42 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUBSAMPLE_READS.fork0.chnk3.main
2025-03-05 01:27:42 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.WRITE_POS_BAM
2025-03-05 01:27:42 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.WRITE_POS_BAM.fork0.join
2025-03-05 01:27:43 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.CALL_TAGS_JIBES
2025-03-05 01:27:43 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.CALL_TAGS_JIBES.fork0.join
2025-03-05 01:27:48 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.CALL_TAGS_JIBES
2025-03-05 01:27:48 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.DETERMINE_SAMPLE_ASSIGNMENTS
2025-03-05 01:27:48 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.DETERMINE_SAMPLE_ASSIGNMENTS.fork0.chnk0.main
2025-03-05 01:27:49 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.DETERMINE_SAMPLE_ASSIGNMENTS
2025-03-05 01:27:49 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.MULTI_WRITE_PER_SAMPLE_MATRICES
2025-03-05 01:27:49 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.MULTI_COLLATE_PER_SAMPLE_METRICS
2025-03-05 01:27:49 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.MULTI_WRITE_PER_SAMPLE_BAM
2025-03-05 01:27:49 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.COMPUTE_EXTRA_MULTIPLEXING_METRICS
2025-03-05 01:27:49 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.COMPUTE_EXTRA_MULTIPLEXING_METRICS.fork0.split
2025-03-05 01:27:49 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.MULTI_WRITE_PER_SAMPLE_MOLECULE_INFO
2025-03-05 01:27:49 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.STRUCTIFY_PER_SAMPLE_OUTS
2025-03-05 01:27:49 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.STRUCTIFY_PER_SAMPLE_OUTS.fork0.chnk0.main
2025-03-05 01:27:50 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.STRUCTIFY_PER_SAMPLE_OUTS
2025-03-05 01:27:50 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.DISABLE_FEATURE_STAGES
2025-03-05 01:27:50 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.DISABLE_FEATURE_STAGES.fork0.chnk0.main
2025-03-05 01:27:50 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.DISABLE_FEATURE_STAGES
2025-03-05 01:27:50 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.ANALYZER_PREFLIGHT
2025-03-05 01:27:50 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.ANALYZER_PREFLIGHT.fork0.chnk0.main
2025-03-05 01:27:50 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER._TARGETED_ANALYZER.SUBSAMPLE_OFF_TARGET_READS
2025-03-05 01:27:50 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER._TARGETED_ANALYZER.SUBSAMPLE_ON_TARGET_READS
2025-03-05 01:27:50 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER._ANTIBODY_ANALYZER.SUMMARIZE_ANTIBODY_ANALYSIS
2025-03-05 01:27:50 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER._ANTIBODY_ANALYZER.CALL_ANTIBODIES
2025-03-05 01:27:51 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.COMPUTE_EXTRA_MULTIPLEXING_METRICS
2025-03-05 01:27:51 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.COMPUTE_EXTRA_MULTIPLEXING_METRICS.fork0.join
2025-03-05 01:27:52 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.ANALYZER_PREFLIGHT
2025-03-05 01:27:52 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.PREPROCESS_MATRIX
2025-03-05 01:27:52 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.PREPROCESS_MATRIX.fork0.split
2025-03-05 01:27:53 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.COMPUTE_EXTRA_MULTIPLEXING_METRICS
2025-03-05 01:27:53 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.MERGE_METRICS
2025-03-05 01:27:53 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.MERGE_METRICS.fork0.chnk0.main
2025-03-05 01:27:53 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._ASSIGN_TAGS.MERGE_METRICS
2025-03-05 01:27:55 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.WRITE_POS_BAM
2025-03-05 01:27:56 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.PREPROCESS_MATRIX
2025-03-05 01:27:56 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.PREPROCESS_MATRIX.fork0.join
2025-03-05 01:28:02 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.PREPROCESS_MATRIX
2025-03-05 01:28:02 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_PCA
2025-03-05 01:28:02 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_PCA.fork0.split
2025-03-05 01:28:02 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_FBPCA
2025-03-05 01:28:02 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_MULTIGENOME_ANALYSIS
2025-03-05 01:28:02 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_MULTIGENOME_ANALYSIS.fork0.split
2025-03-05 01:28:02 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.CORRECT_CHEMISTRY_BATCH
2025-03-05 01:28:02 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_PCA
2025-03-05 01:28:02 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_PCA.fork0.join
2025-03-05 01:28:04 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_MULTIGENOME_ANALYSIS
2025-03-05 01:28:04 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_MULTIGENOME_ANALYSIS.fork0.join
2025-03-05 01:28:06 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_MULTIGENOME_ANALYSIS
2025-03-05 01:28:07 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_PCA
2025-03-05 01:28:07 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.CHOOSE_DIMENSION_REDUCTION_OUTPUT
2025-03-05 01:28:07 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.CHOOSE_DIMENSION_REDUCTION_OUTPUT.fork0.chnk0.main
2025-03-05 01:28:07 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.CHOOSE_DIMENSION_REDUCTION_OUTPUT
2025-03-05 01:28:08 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_UMAP
2025-03-05 01:28:08 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_UMAP.fork0.split
2025-03-05 01:28:08 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_GRAPH_CLUSTERING
2025-03-05 01:28:08 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_GRAPH_CLUSTERING.fork0.split
2025-03-05 01:28:08 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_TSNE
2025-03-05 01:28:08 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_TSNE.fork0.split
2025-03-05 01:28:08 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS
2025-03-05 01:28:08 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS.fork0.split
2025-03-05 01:28:08 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_GRAPH_CLUSTERING
2025-03-05 01:28:08 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_GRAPH_CLUSTERING.fork0.join
2025-03-05 01:28:08 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_TSNE
2025-03-05 01:28:08 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_TSNE.fork0.chnk0.main
2025-03-05 01:28:08 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_UMAP
2025-03-05 01:28:08 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_UMAP.fork0.chnk0.main
2025-03-05 01:28:10 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS
2025-03-05 01:28:10 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS.fork0.chnk0.main
2025-03-05 01:28:10 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS.fork0.chnk1.main
2025-03-05 01:28:10 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS.fork0.chnk2.main
2025-03-05 01:28:10 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS.fork0.chnk3.main
2025-03-05 01:28:10 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS.fork0.chnk4.main
2025-03-05 01:28:10 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS.fork0.chnk5.main
2025-03-05 01:28:10 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS.fork0.chnk6.main
2025-03-05 01:28:10 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS.fork0.chnk7.main
2025-03-05 01:28:10 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS.fork0.chnk8.main
2025-03-05 01:28:10 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_GRAPH_CLUSTERING
2025-03-05 01:28:14 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_TSNE
2025-03-05 01:28:14 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_TSNE.fork0.join
2025-03-05 01:28:14 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_TSNE
2025-03-05 01:28:16 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS
2025-03-05 01:28:16 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS.fork0.join
2025-03-05 01:28:19 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_KMEANS
2025-03-05 01:28:19 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.COMBINE_CLUSTERING
2025-03-05 01:28:19 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.COMBINE_CLUSTERING.fork0.chnk0.main
2025-03-05 01:28:20 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.COMBINE_CLUSTERING
2025-03-05 01:28:20 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION
2025-03-05 01:28:20 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION.fork0.split
2025-03-05 01:28:20 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION
2025-03-05 01:28:20 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION.fork0.chnk0.main
2025-03-05 01:28:20 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION.fork0.chnk1.main
2025-03-05 01:28:20 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION.fork0.chnk2.main
2025-03-05 01:28:20 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION.fork0.chnk3.main
2025-03-05 01:28:20 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION.fork0.chnk4.main
2025-03-05 01:28:20 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION.fork0.chnk5.main
2025-03-05 01:28:20 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION.fork0.chnk6.main
2025-03-05 01:28:20 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION.fork0.chnk7.main
2025-03-05 01:28:31 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_UMAP
2025-03-05 01:28:31 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_UMAP.fork0.join
2025-03-05 01:28:31 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_UMAP
2025-03-05 01:28:34 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION
2025-03-05 01:28:34 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION.fork0.join
2025-03-05 01:28:34 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUBSAMPLE_READS
2025-03-05 01:28:34 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUBSAMPLE_READS.fork0.join
2025-03-05 01:28:35 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.RUN_DIFFERENTIAL_EXPRESSION
2025-03-05 01:28:35 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.SUMMARIZE_ANALYSIS
2025-03-05 01:28:35 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.SUMMARIZE_ANALYSIS.fork0.split
2025-03-05 01:28:35 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.SUMMARIZE_ANALYSIS
2025-03-05 01:28:35 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.SUMMARIZE_ANALYSIS.fork0.chnk0.main
2025-03-05 01:28:36 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUBSAMPLE_READS
2025-03-05 01:28:36 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUMMARIZE_BASIC_REPORTS
2025-03-05 01:28:36 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUMMARIZE_BASIC_REPORTS.fork0.split
2025-03-05 01:28:37 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.SUMMARIZE_ANALYSIS
2025-03-05 01:28:37 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.SUMMARIZE_ANALYSIS.fork0.join
2025-03-05 01:28:38 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER.SC_RNA_ANALYZER.SUMMARIZE_ANALYSIS
2025-03-05 01:28:38 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.GENERATE_LIBRARY_PLOTS
2025-03-05 01:28:38 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUMMARIZE_BASIC_REPORTS
2025-03-05 01:28:38 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUMMARIZE_BASIC_REPORTS.fork0.join
2025-03-05 01:28:45 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER._CELLS_REPORTER.SUMMARIZE_BASIC_REPORTS
2025-03-05 01:28:45 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.MERGE_METRICS
2025-03-05 01:28:45 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.MERGE_METRICS.fork0.chnk0.main
2025-03-05 01:28:46 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_GEM_WELL_PROCESSOR.COUNT_GEM_WELL_PROCESSOR._BASIC_SC_RNA_COUNTER.MERGE_METRICS
2025-03-05 01:28:46 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER._TARGETED_ANALYZER.CALCULATE_TARGETED_METRICS
2025-03-05 01:28:46 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER._TARGETED_ANALYZER.SUMMARIZE_TARGETED_ANALYSIS
2025-03-05 01:28:46 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER._CRISPR_ANALYZER.CALL_PROTOSPACERS
2025-03-05 01:28:46 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.SUMMARIZE_REPORTS
2025-03-05 01:28:46 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.SUMMARIZE_REPORTS.fork0.chnk0.main
2025-03-05 01:28:46 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER._CRISPR_ANALYZER._PERTURBATIONS_BY_TARGET
2025-03-05 01:28:46 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER._CRISPR_ANALYZER._PERTURBATIONS_BY_FEATURE
2025-03-05 01:28:46 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.COUNT_ANALYZER._CRISPR_ANALYZER.SUMMARIZE_CRISPR_ANALYSIS
2025-03-05 01:28:52 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.SUMMARIZE_REPORTS
2025-03-05 01:28:52 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.CLOUPE_PREPROCESS
2025-03-05 01:28:52 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.CLOUPE_PREPROCESS.fork0.split
2025-03-05 01:28:54 [runtime] (split_complete)  ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.CLOUPE_PREPROCESS
2025-03-05 01:28:54 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.CLOUPE_PREPROCESS.fork0.chnk0.main
2025-03-05 01:29:06 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.CLOUPE_PREPROCESS
2025-03-05 01:29:06 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.CLOUPE_PREPROCESS.fork0.join
2025-03-05 01:29:07 [runtime] (join_complete)   ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.CLOUPE_PREPROCESS
2025-03-05 01:29:07 [runtime] (ready)           ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.CHOOSE_CLOUPE
2025-03-05 01:29:07 [runtime] (run:local)       ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.CHOOSE_CLOUPE.fork0.chnk0.main
2025-03-05 01:29:07 [runtime] (chunks_complete) ID.Control-CAF.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MULTI_REPORTER.CHOOSE_CLOUPE

Outputs:
- Run summary HTML:                         /anvil/scratch/x-liu2302/unit1_demo/data/cellranger/Control-CAF/outs/web_summary.html
- Run summary CSV:                          /anvil/scratch/x-liu2302/unit1_demo/data/cellranger/Control-CAF/outs/metrics_summary.csv
- BAM:                                      /anvil/scratch/x-liu2302/unit1_demo/data/cellranger/Control-CAF/outs/possorted_genome_bam.bam
- BAM index:                                /anvil/scratch/x-liu2302/unit1_demo/data/cellranger/Control-CAF/outs/possorted_genome_bam.bam.bai
- Filtered feature-barcode matrices MEX:    /anvil/scratch/x-liu2302/unit1_demo/data/cellranger/Control-CAF/outs/filtered_feature_bc_matrix
- Filtered feature-barcode matrices HDF5:   /anvil/scratch/x-liu2302/unit1_demo/data/cellranger/Control-CAF/outs/filtered_feature_bc_matrix.h5
- Unfiltered feature-barcode matrices MEX:  /anvil/scratch/x-liu2302/unit1_demo/data/cellranger/Control-CAF/outs/raw_feature_bc_matrix
- Unfiltered feature-barcode matrices HDF5: /anvil/scratch/x-liu2302/unit1_demo/data/cellranger/Control-CAF/outs/raw_feature_bc_matrix.h5
- Secondary analysis output CSV:            /anvil/scratch/x-liu2302/unit1_demo/data/cellranger/Control-CAF/outs/analysis
- Per-molecule read information:            /anvil/scratch/x-liu2302/unit1_demo/data/cellranger/Control-CAF/outs/molecule_info.h5
- CRISPR-specific analysis:                 null
- CSP-specific analysis:                    null
- Loupe Browser file:                       /anvil/scratch/x-liu2302/unit1_demo/data/cellranger/Control-CAF/outs/cloupe.cloupe
- Feature Reference:                        null
- Target Panel File:                        null
- Probe Set File:                           null

Waiting 6 seconds for UI to do final refresh.
Pipestance completed successfully!

2025-03-05 01:29:14 Shutting down.
The code above is just an example for processing one sample file.

We have processed all samples. All output files are saved at: "/anvil/projects/x-tra220018/current/datasets/single_cellData/Ratliff_CAF/results/Control-CAF/outs/". We will directly load the count data matrix in folder "filtered_feature_bc_matrix" for analysis with Seurat.

Quality Control and Analysis using Seurat¶

For the following processing and analysis steps for scRNA-seq data, we are going to use the Seurat, a popular package in R that provides the users with well curated functions and workflows. Seurat was first developed for clustering of scRNAseq data, but with continuing updates in the last few years, this package has become a popular tool for QC, analysis and exploration of scRNAseq data as well. Seurat is easily implemented and is also a very powerful analysis, with workflows well-maintained and updated regularly. For more information on Seurat, see the Seurat website from the Satija Lab, which has very nice documentation and links to the Satija Lab publications as well as detailed tutorials and vignettes.

In this notebook, we will use the most recent version of Seurat 4.0.

QC and clustering for each set (control as an example)¶

Before we can perform any analysis, we need to import the pre-processed data into R and set up an Seurat object. The Read10X command reads the filtered barcode matrices generated from cellranger countand returns a count matrix caf_ctrl_data. Each row of this matrix is a feature/gene and each column is a cell. This count matrix is similar to the one you generated from cellranger, except that this Read10X matrix will represent the number of unique molecules observed for each feature (gene; row) in each cell (column).

Next, we will use this count matrix to create a Seurat object, caf_ctrl, which serves as a container for data, analysis, and metadata. The count matrix is stored as caf_ctrl[["RNA"]]@counts.

In this section, we will only take the "Control" group as an example. After running the following code, we can see that the "Control" group contains 33694 genes and 3321 cells.

The following code might generate a warning message and you can ignore it.

In [2]:
%use r
# -------------------- Running only on Control treatment ----------------- #
# Create Seurat object
library(Seurat)
data_path="/anvil/projects/x-tra220018/current/datasets/single_cellData/Ratliff_CAF/results"
caf_ctrl_data <- Read10X(data.dir = paste0(data_path, "/Control-CAF/outs/filtered_feature_bc_matrix"))
caf_ctrl      <- CreateSeuratObject(counts = caf_ctrl_data, project = "CAFCTRL")
caf_ctrl
# caf_ctrl[["RNA"]]@counts
'as(<dgTMatrix>, "dgCMatrix")' is deprecated.
Use 'as(., "CsparseMatrix")' instead.
See help("Deprecated") and help("Matrix-deprecated").

Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
An object of class Seurat 
33694 features across 3321 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)

Standard pre-process workflow¶

First, we will want to obtain a list of mitochondrial genes, which we will use to identify potentially stressed or damaged cells. We will do this by using the grep function, which is a pattern matching function we will use to search all the gene names for the pattern denoting mitochondrial genes. The pattern we are looking for in gene names starts with "MT-", with the "^" meaning that the pattern you are looking for appears in the beginning of the line. Note that for different genome versions, mitochondrial transcripts or genes may be specified differently. If grep finds no rownames matching the pattern specified, check the genome version and annotation you used to see how mitochondrial genes are named. Some genomes use "M" or "Mito" to specify mitochondrial genes instead. Simply search on whatever pattern is appropriate for your data. Setting "value = TRUE" specifies that you want to return a vector of the actual matching elements, rather than simply a vector of the indices of the matching elements.

Next, you will calculate the percentage of reads that match this pattern (the percentage of reads mapping to mitochondrial transcripts) using the Seurat function PercentageFeatureSet with the same pattern used to specify mitochondrial genes. You will add this information to metadata in the caf_ctrl Seurat object.

In [3]:
# ------ Quality Control ------ #
# Percent mitochondrial genes
mito.genes   <- grep(pattern = "^MT-", x = rownames(x = caf_ctrl[["RNA"]]@counts), value = TRUE)
caf_ctrl[["percent.mito"]] <- PercentageFeatureSet(caf_ctrl, pattern = "^MT-")

Next, we will visualize the resulting data by making violin plots of specific features. In this case, we are looking at the number of nCount_RNA (the number of molecules detected in a cell), nFeature_RNA (the number of genes detected in a cell), and percent.mito, which you calculated above. Each of these measures gives you an idea of the quality of the cells in your dataset. Low nCount_RNA in a cell means the cell could be dead/dying or a droplet may have been empty. Low nFeature_RNA can likewise indicate a dead/dying cell, whereas high nFeature_RNA may indicate a doublet/multiplet. These features can be used in combination to filter the dataset to remove damaged/low quality cells/doublets.

In [4]:
#visualize QC metrices as violin plots
options(repr.plot.width=15, repr.plot.height=6) # set plot size
VlnPlot(object = caf_ctrl, features = c("nFeature_RNA", "nCount_RNA", "percent.mito"), ncol = 3)

Another way to visualize is using FeatureScatter function. FeatureScatter function in Seurat generates scatterplots for nCount_RNA, percent.mito, and nFeature_RNA. A balance must be struck between keeping as much data as possible, but removing possible damaged cells and multiplets. FeatureScatter is typically used to visualize feature-feature relationships, but can be used for anything calculated by the object, i.e. columns in the object metadata, PC scores, etc.

In [5]:
# Scatter plot for nCount_RNA, percent.mito, nFeature_RNA and filtering based on these variables
# FeatureScatter is typically used to visualize feature-feature relationships, but can be used
# for anything calculated by the object, i.e. columns in object metadata, PC scores etc.

plot1 <- FeatureScatter(caf_ctrl, feature1 = "nCount_RNA", feature2 = "percent.mito")
plot2 <- FeatureScatter(caf_ctrl, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
library(patchwork)
plot1 + plot2 +plot_layout(ncol=2)

After visualizing the QC metrics, we are ready to filter the data based on plots using the subset function. Here we use a very loose condition to simply remove dead/damaged cells as well as potential doublets/multiplets. You may adjust the parameters as you like based on the plots.

  • We keep cells that have unique feature counts over 2000;
  • We keep cells with 10000 < counts <90000;
  • We keep cells that have < 30% mitochondrial counts.
In [6]:
# Filter cells
caf_ctrl <- subset(caf_ctrl, subset = nFeature_RNA > 2000 & 
                   percent.mito < 30 & nCount_RNA >10000 & nCount_RNA <90000)
save(caf_ctrl, file="./data/caf_ctrl_QC.RData")

Normalizing the data¶

After removing unwanted cells, we are ready to normalize the data using NormalizeData command. There are a number of methods for normalizing data in Seurat. In this example we use a common technique that Seurat employs by default, in which a global-scaling normalization method "LogNormalize" is used to normalize the gene expression measurements for each cell by total expression, multiplying this by a scaling factor and then log-transforming the result. Here we use a method equivalent to log CPM. Another common scaling factor used is 10,000. These normalized values are then stores in the caf_ctrl object.

In [7]:
%use r
library(Seurat)
load("./data/caf_ctrl_QC.RData")
caf_ctrl <- NormalizeData(object = caf_ctrl, normalization.method = "LogNormalize", scale.factor = 1000000)
# Normalized values are stored in caf_ctrl[["RNA"]]@data.
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|

Identification of highly variable features¶

Next, we will reduce the dimensions of the Seurat object by finding highly variable genes, which highlights biological signals (Brennecke et al. 2013), and focus on these for the downstream analysis. The FindVariableGenes function calculates the average expression and dispersion for every gene and places these into bins. Seurat then calculates a z-score for dispersion within each bin. This helps control for the mean-variance relationship that we discussed in the bulk RNA-seq section and is discussed in more detail in Macosko et al.

By default, 2000 variable features are selected from each dataset, which we can see using the length function in R.

In [8]:
%use r
# Finding highly variable genes (feature selection), we return 2000 features per dataset by default
caf_ctrl <- FindVariableFeatures(object = caf_ctrl, selection.method ="vst", 
                                 mean.cutoff = c(0.1,6), dispersion.cutoff = c(0.5,Inf),verbose=TRUE)
length(VariableFeatures(caf_ctrl))
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
2000
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|

A visualization of these features are generated with head function from base R. The variable features are highlighted in red in the plots and the top 10 variable features are labeled in the second plot. (There might be several warning messages and you can safely ignore them.)

In [9]:
%use r
library(patchwork)
# Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(caf_ctrl), 10)
#top10 # check top 10 highly variable genes

# plot variable features with and without labels
plot1 <- VariableFeaturePlot(caf_ctrl)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
plot1 + plot2 + plot_layout(ncol=2)
When using repel, set xnudge and ynudge to 0 for optimal results

Warning message:
“Transformation introduced infinite values in continuous x-axis”
Warning message:
“Transformation introduced infinite values in continuous x-axis”

Cell cycle correction¶

Single cell datasets often contain many uninteresting sources of variation, such as technical noise, cell cycle stage, and batch effects. A major source of systemmatic bias in scRNAseq is the cell cycle, which usually introduces within-cell-type heterogeneity. This bias will obscure the difference in expression between cell types. Regressing out these signals can vastly improve downstream dimensionality reduction, clustering, and differential expression analyses.

To remove these effects, Seurat constructs linear models to predict gene expression based on variables. Then the scaled z-score residuals of these linear models are stored in the Seurat object and are used for dimensionality reduction and clustering. Before we can regress out cell cycle effects, we must determine where the cells are in the cell cycle. We do this by calculating a cell-cycle score and regress this out, as well as other potentially uninteresting sources of variation, such as the number of detected molecules per cell and the percentage of reads mapping to mitochondrial genes.

The cell cycle scores are calculated using CellCycleScoring function in Seurat, and results are saved in the object meta data.

Here, we load a list of genes associated with either the S phase or the G2/M phase of the cell cycle. These markers, originally published in (Kowalczyk et al, 2015) are loaded with Seurat. Note that although Seurat predicts the cell cycle phase of each cell, these predictions are not used in downstream data analysis. Instead, Seurat uses the quantitative cell cycle score in downstream scaling. (It might produce several warning messages, but this won't affect our results.)

In [16]:
%use r
# Cell cycle scoring
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes

caf_ctrl  <- CellCycleScoring(caf_ctrl, s.features = s.genes, g2m.features = g2m.genes, 
                                 set.ident = F)
head(caf_ctrl@meta.data)
unique(caf_ctrl@meta.data$Phase)
Warning message:
“The following features are not present in the object: MLF1IP, not searching for symbol synonyms”
A data.frame: 6 × 7
orig.identnCount_RNAnFeature_RNApercent.mitoS.ScoreG2M.ScorePhase
<fct><dbl><int><dbl><dbl><dbl><chr>
AAACCTGAGACGCTTT-1CAFCTRL14952348921.421883-0.21850143 0.1015109G2M
AAACCTGAGGGTATCG-1CAFCTRL263715111 9.408062 0.07548027-0.1098930S
AAACCTGGTCTAGTCA-1CAFCTRL498086210 4.051558-0.08005434-0.1029985G1
AAACCTGGTGATGTGG-1CAFCTRL483275831 3.041778 0.66149059 1.3811409G2M
AAACCTGGTTATCACG-1CAFCTRL691056968 5.348383 0.20619971-0.2843568S
AAACCTGGTTGGTTTG-1CAFCTRL215764376 4.718205 0.09631241-0.1565114S
  1. 'G2M'
  2. 'S'
  3. 'G1'

Scaling the data¶

Before performing dimension reduction method such as PCA, we need to adjust the cell cycle and remove any unwanted source of variation in expression data. We will add the cell cycle score to meta data in the caf_ctrl Seurat object (note that there are two different ways we can add information to meta data!) and then we use the function ScaleData to regress out the features that we have decided could introduce uninteresting sources of variability.The goal of this step is to:

  1. 'regress out' heterogeneity associated with cell cycle phases, total counts per cell and percent of mitochondrial genes.
  2. Shift the mean of expression data to 0
  3. Scale the expression data to make variance across cell equals to 1.

By performing scaling, we adjust the weight of each gene such that genes with large counts won't dominate low-expressed genes in downstream analysis.

In [17]:
%use r
caf_ctrl@meta.data$CC.Difference <- caf_ctrl@meta.data$S.Score - caf_ctrl@meta.data$G2M.Score
caf_ctrl <- ScaleData(object = caf_ctrl, vars.to.regress = c("nCount_RNA", "percent.mito", "CC.Difference"))
Regressing out nCount_RNA, percent.mito, CC.Difference

Centering and scaling data matrix

Always remember to save the intermediate Seurat object to avoid rerunning previous steps.

In [18]:
save(caf_ctrl, file="./data/caf_ctrl_norm.RData")

Clustering¶

Perform linear dimensional reduction and visualiztion¶

Now, we will perform PCA on the scaled data. Running dimensionality reduction on the highly variable genes can improve performance, though in general, PCA tends to return similar results with UMI data when run on all genes or when run only on the highly variable subset of genes. Thus in general, most now run dimensionality reduction and subsequent clustering on the subset of highly variable genes, in order to reduce computational resources and time needed in the analysis as well as to highlight biological signal.

PCA will be performed on the scaled data with RunPCA function. Several visualization methods are shown below using Dimplot, VizDimReduction and DimHeatmap.

In [19]:
library(Seurat)
load("./data/caf_ctrl_norm.RData")
caf_ctrl <- RunPCA(object = caf_ctrl, verbose=F)
DimPlot(object = caf_ctrl)

Here we visualize the top genes associated with reduction components for the first 2 PCs.

In [20]:
VizDimLoadings(object = caf_ctrl, dims = 1:2)

DimHeatmap is a useful way to identify the primary sources of heterogeneity, and is often used to determine how many PCs to include in downstream analysis. Here, we generate the heatmap with the first 6 PCs. The cells and genes are sorted by their principal component scores and the heatmaps allow us to visualize the heterogeneity in the data. By setting cells=500, we are plotting the 500 most extreme cells on both ends of the spectrum.

In [21]:
DimHeatmap(object = caf_ctrl, dims = 1:9, cells = 500, balanced = TRUE)

Determine the 'dimensionality' of the dataset¶

Next, we want to determine the dimensionality of the data. This will allow us to determine how many dimensions we want to use in downstream analyses, as not all the dimensions are likely to be important. The JackStraw function randomly permutes a subset of the data (by default 1%) and calculates projected PCA scores for these random genes. Next, we can compare the PCA scores for this null distribution of random genes with the PCA scores from the observed data to allow us to determine statistical significance and to calculate a p-value for each gene's association with each principal component. The ScoreJackStraw function computes JackStraw Scores significance. Basically, significant PCs are expected to show a p-value distribution that is strongly scored to the left when compared with the null distribution. The p-value for each PC is based on a proportion test that compares the number of features with a p-value below a threshold (<1e-05), compared with the proportion of features expected under a uniform distribution of p-values. Lastly, we plot these results.

The following code will take several minutes to complete.
In [22]:
caf_ctrl <- JackStraw(object = caf_ctrl, num.replicate = 100, verbose = FALSE)
caf_ctrl <- ScoreJackStraw(caf_ctrl, dims = 1:20)

JackStrawPlot(object = caf_ctrl, dims = 1:20)
Warning message:
“Removed 28000 rows containing missing values (`geom_point()`).”

Alternatively, we can also determine which PCs to include by looking at 'Elbow plot'. It ranks principal components by the percentage of the variance explained by each one. We observe an "elbow" at the point where the PCs after capture less of the variation seen in the data (around PC 8 or 9 here). In practice, it can be difficult to determine the number of PCs to use in downstream analysis, however, if you run the later codes with various numbers of PCs (8,9, 30...) you will see that generally, the results do not differ much.

In [23]:
ElbowPlot(object = caf_ctrl)

Choosing the correct number of dimensions can be challenging. We only choose the first 10 PCs here, but we encourage you to repeat the clustering and downstream analysis with different numbers of PCs.

Cluster the cells¶

The next step in the analysis is to use unsupervised clustering to group cells into clusters (groups) of similar cells. Seurat uses a graph-based clustering technique built upon (Macosko et al, 2015) as well as (Xu & Su, 2015) and (Levine et al, 2015). Cells are embedded in a graph structure, in the default case using a K-nearest neighbor (KNN) graph, with edges connecting cells that have similar gene expression patterns, partitioning the group into highly connected communities. This step of making a KNN graph based on the first 10 PCs, in this case, is performed in the FindNeighbors function. Next cells are clustered using the Louvain algorithm by default to group cells that are similar together in the FindClusters function, with resolution setting the parameters used in this clustering function. It is recommended that user try multiple different resolution settings. Higher resolution leads to more clusters and in general, you will usually either over or under cluster your data. If on the side of overclustering data, we can decrease the resolution and then eventually merge clusters together that are similar, if need-be. Generally, the best resolution to ensure you don't under-cluster your data will be to use increased resolution for datasets of increasing size.

In [24]:
# cluster the cells
caf_ctrl <- FindNeighbors(caf_ctrl, reduction="pca", dims = 1:10, verbose = F, force.recalc = T)
caf_ctrl <- FindClusters(object = caf_ctrl, resolution = 0.4)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 3199
Number of edges: 100595

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.7737
Number of communities: 7
Elapsed time: 0 seconds

Run non-linear dimensional reduction (UMAP/tSNE)¶

Next, we will visualize the clusters we just identified by the earlier clustering analysis using non-linear dimensional reduction techniques. Seurat offers several non-linear dimension reduction and visualization methods, such as tSNE and UMAP. tSNE and UMAP were designed to preserve global structure and group nearby data together, and to provide informative visualization of this heterogeneity.

It is not advised to cluster on tSNE components, however it is a powerful visualization technique. As input to RunTSNE function, use the same dimensions you used as input into the clustering functions. The tSNE algorithm will place cells with similar neighborhoods in the graph embedding into similar locations in low dimensional space. This will allow you to visualize the high dimensional clustering that you did earlier in 2D. UMAP is another similar technique that you can likewise use, which is faster and some argue maintains the global structure of the data better in low dimensional space, though others argue that this mainly has to do with parameter setting chosen when running the reduction.

In [25]:
caf_ctrl <- RunTSNE(object = caf_ctrl, dims = 1:10, do.fast = TRUE)
DimPlot(object = caf_ctrl, reduction="tsne")
In [26]:
caf_ctrl <- RunUMAP(object = caf_ctrl, dims = 1:10,min.dist=0.01,spread=3)
DimPlot(object = caf_ctrl, reduction="umap")
Warning message:
“The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session”
00:53:09 UMAP embedding parameters a = 0.3356 b = 0.7939

00:53:09 Read 3199 rows and found 10 numeric columns

00:53:09 Using Annoy for neighbor search, n_neighbors = 30

00:53:09 Building Annoy index with metric = cosine, n_trees = 50

0%   10   20   30   40   50   60   70   80   90   100%

[----|----|----|----|----|----|----|----|----|----|

*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
|

00:53:10 Writing NN index file to temp file /tmp/RtmpP8q2j1/file24643c670c4278

00:53:10 Searching Annoy index using 1 thread, search_k = 3000

00:53:11 Annoy recall = 100%

00:53:11 Commencing smooth kNN distance calibration using 1 thread
 with target n_neighbors = 30

00:53:12 Initializing from normalized Laplacian + noise (using irlba)

00:53:12 Commencing optimization for 500 epochs, with 124700 positive edges

00:53:15 Optimization finished

Save the clustered data into a new RData file.

In [27]:
save(caf_ctrl, file="./data/caf_ctrl_clustered.RData")

DE analysis (using both Seurat and edgeR)¶

There are many different methods to identify differentially expressed genes. Here, we will show you two different ways to identify cluster biomarkers (differentially expressed genes that differentiate various clusters). One is edgeR (same as in bulk RNA-seq), and another is from Seurat, who also provides its own method for finding markers.

Seurat-- Find differentially expressed features (cluster biomarkers)¶

In the first method, we will use FindMarkers to run a nonparametric Wilxcoxon test to identify biomarkers. The min.pct argument requires a gene to be detected at a minimum percentage in either of the two cell clusters. Below, ident.1 argument indicates the cluster ID, and this specifies that we want to compare cluster X with all other clusters. This will identify genes which are different between cluster X and all other clusters. In the code below, many of the lines are commented out, because each function takes a few minutes to finish. You can just run one of them as an example.

Below we start by finding genes that are differentially expressed between cluster 0 and all other clusters. If you only want to identify genes that are upregulated (high) in cluster 0 compared to other clusters, you can add the argument "only.pos = TRUE", however here we want to identify both genes that are high and low in cluster 0 compared to the other clusters.

You can compare other clusters using similar codes. The codes for other clusters were commented out below to save running time.

In [28]:
library(Seurat)
load("./data/caf_ctrl_clustered.RData")
# Find Markers, each will take a few minutes to finish, so we only run first one as an example 

# Cluster 0
cluster0.markers <- FindMarkers(object = caf_ctrl, ident.1 = 0, min.pct = 0.1)
print(x = head(x = cluster0.markers, n = 5))

# Cluster 1
# cluster1.markers <- FindMarkers(object = caf_ctrl, ident.1 = 1, min.pct = 0.1)
# print(x = head(x = cluster1.markers, n = 5))

# Cluster 2
# cluster2.markers <- FindMarkers(object = caf_ctrl, ident.1 = 2, min.pct = 0.1)
# print(x = head(x = cluster2.markers, n = 5))

# Cluster 3
# cluster3.markers <- FindMarkers(object = caf_ctrl, ident.1 = 3, min.pct = 0.1)
# print(x = head(x = cluster3.markers, n = 5))

# Cluster 4
# cluster4.markers <- FindMarkers(object = caf_ctrl, ident.1 = 4, min.pct = 0.1)
# print(x = head(x = cluster4.markers, n = 5))

# Cluster 5
# cluster5.markers <- FindMarkers(object = caf_ctrl, ident.1 = 5, min.pct = 0.1)
# print(x = head(x = cluster5.markers, n = 5))
             p_val avg_log2FC pct.1 pct.2    p_val_adj
CALM2 9.565361e-58  0.2933801 1.000 1.000 3.222953e-53
NNMT  1.518187e-56  0.4385994 0.998 0.992 5.115379e-52
TIMP2 1.037616e-55 -0.5730115 0.989 0.990 3.496144e-51
GLRX  1.584119e-55  0.5557841 0.993 0.984 5.337529e-51
NQO1  3.415059e-51  0.4186482 1.000 0.988 1.150670e-46

In the code below, we can also use FindAllMarkers function to automate this process for all clusters. This compares 0 with all other clusters, 1 with all other clusters, etc. Note here that we have specified that min.pct =0.25 , as we want the log(fold-change) threshold for a gene to be at least 0.25 to be identified as significant. Then we select the top 15 markers for each cluster, based on log(fold-changes, but it will take long time to finish.

The codes below will take 10-15 minutes to finish.
In [29]:
 
# Find all Markers, this code will take a long time to complete 
caf_ctrl_markers <- FindAllMarkers(object = caf_ctrl, only.pos = FALSE, min.pct = 0.25, 
                               logfc.threshold = 0.25, slot='scale.data')
head(caf_ctrl_markers)
Calculating cluster 0

Calculating cluster 1

Calculating cluster 2

Calculating cluster 3

Calculating cluster 4

Calculating cluster 5

Calculating cluster 6

A data.frame: 6 × 7
p_valavg_diffpct.1pct.2p_val_adjclustergene
<dbl><dbl><dbl><dbl><dbl><fct><chr>
GLRX8.137034e-620.54457890.7600.4491.627407e-580GLRX
GCLM1.063910e-590.54197470.7920.5132.127820e-560GCLM
CALM26.267483e-580.57324160.7410.4291.253497e-540CALM2
SBF2-AS15.351493e-530.51452320.8100.5571.070299e-490SBF2-AS1
NNMT6.714445e-530.51362160.7620.4641.342889e-490NNMT
NQO17.784938e-510.51680750.7520.4491.556988e-470NQO1
In [30]:
library(dplyr)
top15_logFC <- caf_ctrl_markers %>% group_by(cluster) %>% top_n(15, avg_diff)
top15_logFC
A grouped_df: 105 × 7
p_valavg_diffpct.1pct.2p_val_adjclustergene
<dbl><dbl><dbl><dbl><dbl><fct><chr>
8.137034e-620.54457890.7600.449 1.627407e-580GLRX
1.063910e-590.54197470.7920.513 2.127820e-560GCLM
6.267483e-580.57324160.7410.429 1.253497e-540CALM2
5.351493e-530.51452320.8100.557 1.070299e-490SBF2-AS1
6.714445e-530.51362160.7620.464 1.342889e-490NNMT
7.784938e-510.51680750.7520.449 1.556988e-470NQO1
2.067844e-490.49266070.7900.537 4.135688e-460TNFRSF12A
3.482508e-480.50563530.7840.565 6.965015e-450STC2
1.026793e-460.50055180.7240.422 2.053586e-430PRDX1
2.158916e-440.48658530.7340.477 4.317832e-410TXNRD1
3.032248e-420.50903070.7140.466 6.064496e-390PFN1
9.318796e-410.47688710.6850.438 1.863759e-370TPM1
1.072448e-400.47234940.7900.538 2.144897e-370TNFRSF11B
4.503110e-400.47283370.7260.458 9.006220e-370CFL1
8.663760e-400.46872970.7970.571 1.732752e-360IGFBP3
6.320753e-1430.80865310.8370.3931.264151e-1391OST4
6.774061e-1040.62595370.8450.4711.354812e-1001CAMK2N1
2.572495e-940.75839330.7730.406 5.144990e-911MT-ND4
4.137869e-860.67951280.7820.467 8.275739e-831FSTL1
4.404964e-850.73927040.7470.429 8.809928e-821TMSB10
2.026055e-770.56164560.8230.523 4.052111e-741IGF2
3.007625e-710.58477350.7850.471 6.015250e-681TIMP2
2.074076e-600.57279210.7920.531 4.148153e-571NABP1
6.887300e-560.62092320.6820.441 1.377460e-521SELM
2.792222e-550.55305280.7100.446 5.584444e-521POLR2L
1.101790e-540.60731510.6930.419 2.203580e-511TMSB4X
7.168131e-470.62017730.6800.463 1.433626e-431FTL
9.000951e-450.55086120.8190.609 1.800190e-411MRVI1
8.235448e-430.54247760.8090.588 1.647090e-391USP53
1.996406e-300.53379860.5820.319 3.992812e-271SCUBE3
⋮⋮⋮⋮⋮⋮⋮
2.652816e-364.5449041.0000.2995.305632e-335MKI67
3.632837e-344.5659720.9820.3037.265674e-315CEP55
5.762209e-344.2825210.9820.3191.152442e-305GTSE1
6.966664e-284.5725620.9290.3061.393333e-245DEPDC1
4.136647e-274.5358060.9290.3708.273294e-245KIF2C
7.251863e-244.9390700.8930.3381.450373e-205SPC25
1.163520e-224.5351900.8750.2522.327039e-195CDCA8
2.392881e-204.3792110.8570.2734.785761e-175KIFC1
6.460913e-194.2781360.8390.3181.292183e-155NUF2
1.353696e-184.3536910.8390.2902.707392e-155SGOL1
7.217916e-155.3399040.8040.2661.443583e-115ESCO2
2.771655e-134.3020380.7860.1645.543310e-105PKMYT1
2.998486e-124.3691380.7680.1695.996972e-095ASF1B
1.713129e-054.5486870.6070.0253.426258e-025DTL
3.868072e-034.3857630.5710.0891.000000e+005MCM10
4.729120e-242.4935091.0000.5259.458240e-216S100A11
1.251817e-232.4401201.0000.5052.503634e-206TMSB10
2.780704e-212.0847621.0000.5085.561408e-186POLR2L
6.562813e-211.5645471.0000.5751.312563e-176COTL1
7.172092e-211.5985190.9740.5371.434418e-176SEC61G
3.712187e-202.0305561.0000.4837.424375e-176TMSB4X
6.169797e-201.7150760.9740.5291.233959e-166SH3BGRL3
2.445759e-182.2351100.9210.4504.891518e-156MT2A
2.484284e-181.8441520.9470.5264.968567e-156PFN1
3.866244e-171.6224700.9740.5267.732487e-146C12orf75
9.787282e-171.8988660.9210.4811.957456e-136TPM2
4.953742e-161.6655250.9470.4659.907485e-136ANXA2
7.889069e-161.6025730.9210.4971.577814e-126MYL12A
3.524484e-071.6391890.5530.2157.048967e-046BIRC5
2.001293e-031.5489100.4210.2071.000000e+006TRIP13

In this example, DoHeatmap function generates an expression heatmap for given cells and feature. Here, we plot using the top15 markers in top15_logFC, and save it as Cluster_heatmap_ctrl.png in folder Figure.

Then we save the Seurat object caf_ctrl to a .RData file, which we can later load. This will allow us to save the analyses we have done so far, so that we do not need to redo them again. Later we can simply use the load() function in R to load the same results again.

In [31]:
# generate heatmap
heatmap_ctrl <- DoHeatmap(object = caf_ctrl, features = top15_logFC$gene)+ NoLegend()
heatmap_ctrl

#save plot
png(file = "./Figures/Cluster_heatmap_ctrl.png", width = 1024, height = 768)
print(heatmap_ctrl)
dev.off()

#save marker results
write.csv(caf_ctrl_markers, file = "./data/cluster_markers_ctrl.csv")
png: 2

The below code shows you how to access data in various ways, for example saving the names of cells that are in cluster 0, 3, and then 1. Next you save a gene list of some of your favorite genes in the variable gene_list, and use this to export normalized gene expression values for these genes to CSV files for cluster 0, 3, and 1.

In [32]:
# Find cells in cluster 0 and 3
clstr_0 <- names(caf_ctrl@active.ident[caf_ctrl@active.ident == 0])
clstr_3 <- names(caf_ctrl@active.ident[caf_ctrl@active.ident == 3])
clstr_1 <- names(caf_ctrl@active.ident[caf_ctrl@active.ident == 1])

gene_list <- c("EBP, FDFT1, LSS, MSMO1, SQLE, DHCR7, DHCR24, TM7SF2")
gene_list <- strsplit(gene_list, ",")[[1]]
gene_list <- gsub(" ", "", gene_list)

exp_mat_cls_0 <- caf_ctrl[["RNA"]]@scale.data[gene_list, clstr_0]
exp_mat_cls_3 <- caf_ctrl[["RNA"]]@scale.data[gene_list, clstr_3]
exp_mat_cls_1 <- caf_ctrl[["RNA"]]@scale.data[gene_list, clstr_1]

write.csv(exp_mat_cls_0, file = "./data/cluster0_norm_expr.csv")
write.csv(exp_mat_cls_3, file = "./data/cluster3_norm_expr.csv")
write.csv(exp_mat_cls_1, file = "./data/cluster1_norm_expr.csv")

DE using edgeR¶

Similar to bulkRNA-seq, we can perform DE using the edgeR package. The program edgeR, while developed for bulk RNA-seq data, works very well in practice for single-cell data also, provided the dataset is not too large, at which point memory issues become a problem. Here, we simply want to give you a bit more practice with edgeR, and you can compare the results with what we got from Seurat. We only take cluster 0 and 3 as an example here.

The following code takes about 5 minutes to run.

In [33]:
library(edgeR)
load("./data/caf_ctrl_clustered.RData")

# we only subset 2 clusters here
# Find cells in cluster 0 and 3
clstr_0 <- names(caf_ctrl@active.ident[caf_ctrl@active.ident == 0])
clstr_3 <- names(caf_ctrl@active.ident[caf_ctrl@active.ident == 3])
caf_sub <- caf_ctrl[, j=c(clstr_0, clstr_3)]

counts <- caf_sub[["RNA"]]@counts
group <- caf_sub@meta.data$seurat_clusters


# build dge subject
dge <- DGEList(counts = counts, 
               norm.factors = rep(1, ncol(counts)),
               group = group)

group_edgeR <- factor(group)
design <- model.matrix(~ group_edgeR)
dge <- estimateGLMCommonDisp(dge, design = design)
In [34]:
fit <- glmFit(dge, design)
res <- glmLRT(fit)
pVals <- res$table[,4]
names(pVals) <- rownames(res$table)

pVals <- p.adjust(pVals, method = "fdr")
head(as.data.frame(sort(pVals)),n=30)
A data.frame: 30 × 1
sort(pVals)
<dbl>
PTGDS 0.000000e+00
HSP90AA1 0.000000e+00
UBC8.113929e-290
MGP8.998845e-284
ACTA21.457681e-268
APOE2.315670e-253
CTSK5.226153e-248
ITM2B1.105687e-237
FDPS8.370284e-209
CD633.255205e-202
C1R2.398328e-188
SEPT71.637588e-184
PCOLCE3.881557e-180
ACTG25.150734e-177
TUBA1A3.995328e-174
C1S3.359145e-172
CD93.431531e-172
HSP90AB16.449721e-171
HLA-A5.632220e-169
LAPTM4A6.260576e-164
FABP31.897097e-160
TMEM176B1.205865e-159
TFPI23.239466e-156
CLU2.488584e-155
FBLN14.415904e-151
LGALS3BP3.465907e-147
GPNMB9.587354e-144
AKAP124.262962e-134
CTSL7.907964e-134
SERPINF11.720762e-130

Integration analysis¶

Many times, we do not just have one single-cell dataset to analyze, but we have many. While it can be useful to analyze each separately, it is also necessary many times to integrate the datasets together.

The Seurat package introduces a new method to integrate multiple datasets together, even if they are collected from different individuals, environmental conditions. Datasets are mixed and harmonized together using identified 'anchors' that represent the pairwise correlation between cells. Details of the integration method can be found in this paper.

Next we will use what we have already learned and analyze three datasets together, eventually clustering them together. The data here are the control (untreated) CAF cells we worked with earlier, DHT (dihydrotestosterone) treated CAFs and E2 (estradiol) treated CAFs.

Setup Seurat object for 3 datasets¶

Like what we did for "Control" group, we are going to read in data and construct Seurat object for each group.

The code below each takes 5 minutes to run.

In [35]:
library(Seurat)
# ******************** Merge data from all experiments using Seurat ********************** #
data_path="/anvil/projects/x-tra220018/current/datasets/single_cellData/Ratliff_CAF/results"

caf_ctrl_data <- Read10X(data.dir = paste0(data_path, "/Control-CAF/outs/filtered_feature_bc_matrix"))
caf_ctrl      <- CreateSeuratObject(counts = caf_ctrl_data, project = "CAFCTRL")

caf_dht_data  <- Read10X(data.dir = paste0(data_path, "/DHT-CAF/outs/filtered_feature_bc_matrix"))
caf_dht       <- CreateSeuratObject(counts = caf_dht_data, project = "CAFDHT")

caf_e2_data   <- Read10X(data.dir = paste0(data_path, "/E2-CAF/outs/filtered_feature_bc_matrix"))
caf_e2        <- CreateSeuratObject(counts = caf_e2_data, project = "CAFE2")
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
In [36]:
# Combine data
caf_combine <- merge(x = caf_ctrl, y = c(caf_dht, caf_e2), merge.data=TRUE, 
                    add.cell.ids = c("ctrl", c("dht", "e2")) , project = "CAF1")

#get count matrix
caf_exprs_mat <- caf_combine[["RNA"]]@counts

# save data
save(caf_combine, file = "./data/caf_combine.RData")
save(caf_exprs_mat, file = "./data/caf_exprs_mat.RData")

Here, we load Seurat and our combined object, then split the dataset into a list describing the original identity of each dataset (condition). This will allow us to use functions in R to process the datasets, cutting down on the lines of code we write. We will address these goals for integration analysis:

  • Identify cell types that present in all datasets;
  • Identify cell type markers;

Dataset preprocessing¶

In [37]:
library(cowplot)
load("./data/caf_combine.RData")
caf.list <- SplitObject(caf_combine, split.by = "orig.ident")
caf.list
$CAFCTRL
An object of class Seurat 
33694 features across 3321 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)

$CAFDHT
An object of class Seurat 
33694 features across 4166 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)

$CAFE2
An object of class Seurat 
33694 features across 4932 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)

We will process the data just as we had previously, using the functions NormalizeData and ScaleData, this time within a function to process all three datasets.

In [38]:
# -------- standard pre-processing and identify features ----------- #
caf.list <- lapply(X = caf.list, FUN = function(x) {
    x <- NormalizeData(x)
    x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})

Perform integration¶

Now, we will perform an integrated analysis, where we use methods described in Stuart et al, 2019 to integrate the three datasets together, performing clustering analyses on all three datasets combined. When clustering multiple datasets together, we want a method that preserves the individual features of each dataset, while correcting for batch effects and allowing identification of shared features. The present method is a marked improvement over previous methods, which tended to overcorrect for differences between datasets.

The FindIntegrationAnchors function identifies a set of anchors that can be used later to co-cluster the datasets.

The following codes will take several minutes to complete. It will also generate multiple messages as it runs.

In [39]:
# ---------- Integrate datasets with identified anchors -------- #
caf.anchors <- FindIntegrationAnchors(object.list = caf.list, dims = 1:20)
Computing 2000 integration features

Scaling features for provided objects

Finding all pairwise anchors

Running CCA

Merging objects

Finding neighborhoods

Finding anchors

	Found 11176 anchors

Filtering anchors

	Retained 7559 anchors

Running CCA

Merging objects

Finding neighborhoods

Finding anchors

	Found 11802 anchors

Filtering anchors

	Retained 7660 anchors

Running CCA

Merging objects

Finding neighborhoods

Finding anchors

	Found 13383 anchors

Filtering anchors

	Retained 9398 anchors

We now run the function IntegrateData in Seurat, which combines the data, using the precomputed anchor set.

In [40]:
caf.integrate <- IntegrateData(anchorset = caf.anchors, dims = 1:20)
Merging dataset 1 into 3

Extracting anchors for merged samples

Finding integration vectors

Finding integration vector weights

Integrating data

Merging dataset 2 into 3 1

Extracting anchors for merged samples

Finding integration vectors

Finding integration vector weights

Integrating data

Perform an integrated analysis¶

Now we will simply run the standard workflow on the combined dataset, re-normalizing data, running PCA, finding, neighbors, finding clusters, and perfoming dimension reduction and visualization. Analysis and visualization methods are similar as in single datasetas well, except that now we are looking at a much larger integrated data so that the analysis is performed on all cells. This time we'll also use UMAP instead of tSNE for dimension reduction.

The codes below each will take several minutes to run.

In [41]:
# ------------ Run a single integrated analysis on all cells ------------ #
DefaultAssay(caf.integrate) <- "integrated"

# Run the standard workflow for visualization and clustering
caf.integrate <- ScaleData(caf.integrate, verbose = FALSE)
caf.integrate <- RunPCA(caf.integrate, npcs = 30, verbose = FALSE)
In [42]:
# t-SNE and Clustering
caf.integrate <- RunUMAP(caf.integrate, reduction = "pca", dims = 1:20)
caf.integrate <- FindNeighbors(caf.integrate, reduction = "pca", dims = 1:20)
caf.integrate <- FindClusters(caf.integrate, resolution = 0.5)
save(caf.integrate, file = "./data/caf.integrate.RData")
00:59:15 UMAP embedding parameters a = 0.9922 b = 1.112

00:59:15 Read 12419 rows and found 20 numeric columns

00:59:15 Using Annoy for neighbor search, n_neighbors = 30

00:59:15 Building Annoy index with metric = cosine, n_trees = 50

0%   10   20   30   40   50   60   70   80   90   100%

[----|----|----|----|----|----|----|----|----|----|

*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
|

00:59:17 Writing NN index file to temp file /tmp/RtmpP8q2j1/file24643c1637eb83

00:59:17 Searching Annoy index using 1 thread, search_k = 3000

00:59:20 Annoy recall = 100%

00:59:21 Commencing smooth kNN distance calibration using 1 thread
 with target n_neighbors = 30

00:59:22 Initializing from normalized Laplacian + noise (using irlba)

00:59:22 Commencing optimization for 200 epochs, with 516642 positive edges

00:59:27 Optimization finished

Computing nearest neighbor graph

Computing SNN

Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 12419
Number of edges: 419488

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8443
Number of communities: 12
Elapsed time: 1 seconds
In [43]:
# Visualization
DimPlot(caf.integrate, reduction = "umap", label = TRUE)

By adding an argument group.by or split.by, we can also visualize all 3 groups or split plots by group.

In [44]:
DimPlot(caf.integrate, reduction = "umap", group.by = "orig.ident")
In [45]:
DimPlot(caf.integrate, reduction = "umap", split.by = "orig.ident", ncol=3)

Identify conserved cell type markers¶

FindConservedMarkers function performs differential expression analysis for each group, and combine the p-values using meta-analysis method from MetaDE package.

In this example, we'll use cluster 0 as an example to find DE markers, which could be target markers that differentiate cell type 0 from all other cells. The cluster ID is specified with the ident.1 argument.

In [46]:
library(MetaDE)
library(metap)

# eg. in cluster 0
load("data/caf.integrate.RData")
DefaultAssay(caf.integrate) <- "RNA"
nk.markers <- FindConservedMarkers(caf.integrate, ident.1 = 0, grouping.var = "orig.ident", verbose = TRUE)
head(nk.markers)
Testing group CAFCTRL: (0) vs (4, 8, 6, 2, 7, 3, 10, 5, 9, 1, 11)

Testing group CAFDHT: (0) vs (9, 1, 7, 8, 5, 2, 6, 4, 11, 3, 10)

Testing group CAFE2: (0) vs (1, 9, 3, 6, 5, 2, 8, 7, 4, 10, 11)

A data.frame: 6 × 17
CAFCTRL_p_valCAFCTRL_avg_log2FCCAFCTRL_pct.1CAFCTRL_pct.2CAFCTRL_p_val_adjCAFDHT_p_valCAFDHT_avg_log2FCCAFDHT_pct.1CAFDHT_pct.2CAFDHT_p_val_adjCAFE2_p_valCAFE2_avg_log2FCCAFE2_pct.1CAFE2_pct.2CAFE2_p_val_adjmax_pvalminimump_p_val
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
NQO19.361404e-570.41159540.9990.9883.154232e-522.573931e-670.42150920.9980.9608.672602e-633.375216e-1030.57301140.9970.9721.137245e-989.361404e-571.012565e-102
PRDX12.160477e-580.30495601.0001.0007.279511e-544.667434e-600.28681761.0000.9981.572645e-552.652461e-1000.39300621.0000.9998.937201e-962.160477e-587.957382e-100
NNMT1.076977e-670.42271411.0000.9893.628767e-633.189462e-660.37126961.0000.9701.074657e-61 2.509703e-990.45964931.0000.9778.456192e-953.189462e-66 7.529108e-99
GAPDH1.439722e-580.28468551.0001.0004.851000e-546.928142e-780.28380601.0001.0002.334368e-73 9.438581e-950.29522831.0001.0003.180235e-901.439722e-58 2.831574e-94
GLRX7.643490e-640.54999350.9990.9792.575397e-591.014001e-710.53127330.9980.9613.416574e-67 4.764570e-850.52207480.9990.9651.605374e-807.643490e-64 1.429371e-84
PTGR11.687555e-370.33796900.9960.9675.686047e-332.197870e-440.32645360.9870.9227.405502e-40 1.357667e-810.51717900.9850.9334.574523e-771.687555e-37 4.073001e-81

We can also explore the distribution of marker genes that differentiate each cluster. By repeating the code above and changing the ident.1 from 0-11, we are able to get lists of markers for each cluster. Below, we only select the most significant marker genes from each of the clusters to visualize. You can also visualize any gene of your interest.

In [47]:
# explore marker genes for each cluster (first 9 clusters)
FeaturePlot(caf.integrate, features = c("MALAT1","S100A4","ANXA2","RPS15","YBX3","PHKG1","MFAP4","BIRC5","FABP3"), 
            min.cutoff = "q9")

Identify differentially expressed genes between conditions¶

We could als explore the differential genes between conditions control, DHT and E2 for cells of the same type. First, we will create a new metadata item in caf.integrate which contains both cell type and condition information. The codes below combines cell type cluster ID with condition names, and saves it as a new ident.

In [48]:
load("data/caf.integrate.RData")
caf.integrate$celltype.caf <- paste(Idents(caf.integrate), caf.integrate$orig.ident, sep = "_")
caf.integrate$celltype <- Idents(caf.integrate)
Idents(caf.integrate) <- "celltype.caf"

Then, we will use FindMarkers function to find differential genes between control and E2 for cell type 6, and report the top 15 genes. You can try yourself with other cell types of interest.

In [49]:
cluster6_ctrl_E2 <- FindMarkers(caf.integrate, ident.1 = "6_CAFE2", ident.2 = "6_CAFCTRL", verbose = FALSE)
head(cluster6_ctrl_E2, n = 15)
A data.frame: 2 × 5
p_valavg_log2FCpct.1pct.2p_val_adj
<dbl><dbl><dbl><dbl><dbl>
ACTA20.014895740.32384820.9941.0001
IGFBP30.431563180.29572080.9330.9921
In [50]:
sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Rocky Linux 8.10 (Green Obsidian)

Matrix products: default
BLAS/LAPACK: /apps/spack/anvil/apps/openblas/0.3.17-gcc-11.2.0-2qrsari/lib/libopenblas_zenp-r0.3.17.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] getPass_0.2-4   fansi_1.0.6     crayon_1.5.3    digest_0.6.36  
 [5] utf8_1.2.4      IRdisplay_1.1   repr_1.1.6      lifecycle_1.0.3
 [9] jsonlite_1.8.8  evaluate_0.24.0 pillar_1.8.1    rlang_1.1.0    
[13] cli_3.6.1       uuid_1.2-0      vctrs_0.6.1     IRkernel_1.3.2 
[17] tools_4.1.0     glue_1.7.0      fastmap_1.1.1   compiler_4.1.0 
[21] base64enc_0.1-3 pbdZMQ_0.3-11   htmltools_0.5.4